{% block head %} {% endblock %} {% block content %}
Individuals or families may be looking to move to Melbourne, and it is a high possibility they are unsure which locations would be the best. Families may want to move to areas with good schools, adventure seekers may be looking for locations with future planned activities, or workers may be looking for a location that has the best predicted job growth.
Business owners may be looking to predict suburbs or locations that are expected to grow in the future, however, it is highly likely they don't have full resources to predict which locations will be best. Real estate investors may want to invest in areas that have high demand to increase ROI on their property, or small business owners may want to branch out to a new area that has optimistic future economic growth.
At the end of this use case you will:
Melbourne was named the most liveable city in Australia, and the 10th in the world according to Global Liveability Index 2022. The city achieved a perfect score for education (100/100) as well as infrastructure (100/100). Furthermore, it scored highly for Culture and Environment (98.6/100) and Stability (95/100) (Study Melbourne)). This information is great for those living or wishing to live in Melbourne, however, the information itself doesn't detail specific locations and data as to why Melbourne is such a great city to live. How can people wanting to move to Melbourne visualise exactly where these liveability metrics are being depicted from?
Visualisations provide a quick and easy way to interpret large amounts of data in simple ways. Various insightful information can be drawn from this method of analysis. As discussed above, there are currently no visualisations that provide individuals or businesses with a way to see liveability characteristics on a map, instead, simple figures are released that do not explain a full story.
The CoM has various datasets that would contribute to Melbourne's overall liveability. Specifically, they can be aligned with the following:
Key Factors of Liveability
The goal of this analysis is to reveal individual suburbs, blocks and locations around Melbourne that would score high on liveability based on the above factors. You will be able to visualise why Melbourne scores so high on liveability, and dive deeper into specific locations that would contribute to this overall success.
*CoM Datasets*
*Vic Government Datasets*
Table of Contents
To begin we shall first import the necessary libraries to support our data analysis and visualisation using Melbourne Open data.
# Standard
import os
import json
# Data import
import urllib
from urllib.request import urlopen
import requests
from sodapy import Socrata
# Data manipulation
import pandas as pd
# Plotting
import plotly.graph_objs as go
import plotly.express as px
To connect to the Melbourne Open Data Portal and gather data, we must use the v2 of their API. In this method, we use the unique dataset id (usually the name as seen below after /datasets/) and create a custom URL.
# Job forecast data
jf_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/city-of-melbourne-jobs-forecasts-by-small-area-2020-2040/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(jf_url)
response = r.json()
jf_data = pd.DataFrame(response)
# Development data
dev_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/development-activity-monitor/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(dev_url)
response = r.json()
dev_data = pd.DataFrame(response)
# Free and cheap support services
ss_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/free-and-cheap-support-services-with-opening-hours-public-transport-and-parking-/exports/json?limit=-1&offset=0&timezone=UTC'
r = requests.get(ss_url)
response = r.json()
ss_data = pd.DataFrame(response)
We have to also connect to the School Locations 2022 dataset from Victoria Government. To do so, we will use a standard method of url libraries to connect to the data API (URL Imports). This method opens the url and and gathers the response (data) as a dataframe.
# School Locations 2022
url = 'https://www.education.vic.gov.au/Documents/about/research/datavic/dv331_schoollocations2022.csv' # url
with urllib.request.urlopen(url) as response:
sl_data = pd.read_csv(response, encoding='cp1252') # dataframe
Next, we will look at one specific dataset to better understand its structure and how we can use it. For this exercise, we will observe the City of Melbourne Jobs Forecasts by Small Area 2020-2040 dataset, specifically, it's first ten rows.
# Print details of data
print(f'The shape of the dataset is: {jf_data.shape}')
print()
print('The first three rows of this dataset are:')
# Print the first 10 rows of data
jf_data.head(10)
The shape of the dataset is: (9114, 5) The first three rows of this dataset are:
| geography | year | category | industry_space_use | value | |
|---|---|---|---|---|---|
| 0 | City of Melbourne | 2024 | Jobs by industry | Accommodation | 10734 |
| 1 | City of Melbourne | 2027 | Jobs by industry | Accommodation | 11913 |
| 2 | City of Melbourne | 2029 | Jobs by industry | Accommodation | 12489 |
| 3 | City of Melbourne | 2030 | Jobs by industry | Accommodation | 12785 |
| 4 | City of Melbourne | 2031 | Jobs by industry | Accommodation | 13086 |
| 5 | City of Melbourne | 2033 | Jobs by industry | Accommodation | 13313 |
| 6 | City of Melbourne | 2037 | Jobs by industry | Accommodation | 13739 |
| 7 | City of Melbourne | 2040 | Jobs by industry | Accommodation | 14053 |
| 8 | City of Melbourne | 2041 | Jobs by industry | Accommodation | 14162 |
| 9 | City of Melbourne | 2024 | Jobs by industry | Admin and support services | 17343 |
We can see that there are 9114 records and 5 fields describing each record. Each record can be broken down into the following fields:
Awesome! After taking a look at one of the CoM datasets we can see the overall structure and contents. Lets now begin our analysis of each individual dataset.
We are now going analyse each individual dataset so we can generate useful information and visualisations about them. This will assist us when producing final interactive maps and predictions later on.
We are going to visualise the job forecasts in each area by 2040, i.e., all jobs from 2020-2040 by each location listed above in 'geography'. However, we first must create a summary of the data.
# Cast datatypes to correct type so we can analyse and summarise
jf_data[['year', 'value']] = jf_data[['year', 'value']].astype(int)
jf_data = jf_data.convert_dtypes()
# Create summary data frame
# Group data by geography field, and aggregate by sum of forecasted jobs
jobsByArea = pd.DataFrame(jf_data.groupby('geography', as_index=False).agg({'value': ['sum']}))
# DataFrame groupby creates two lines of headings
# We flatten the headings to make it easier to extract data for plotting
jobsByArea.columns = jobsByArea.columns.map(''.join) # flatten column header
jobsByArea.rename(columns={'geography': 'featurenam', 'valuesum': 'forecasted_jobs'}, inplace=True) #rename to match GeoJSON format
# Remove 'City of Melbourne' row as this is the sum of all areas and isn't required
jobsByArea = jobsByArea.drop(jobsByArea.index[1])
jobsByArea
| featurenam | forecasted_jobs | |
|---|---|---|
| 0 | Carlton | 1855047 |
| 2 | Docklands | 7081294 |
| 3 | East Melbourne | 2090532 |
| 4 | Kensington | 872976 |
| 5 | Melbourne (CBD) | 21902917 |
| 6 | Melbourne (Remainder) | 2304954 |
| 7 | North Melbourne | 1520826 |
| 8 | Parkville | 2918907 |
| 9 | Port Melbourne | 1681915 |
| 10 | South Yarra | 134480 |
| 11 | Southbank | 4417452 |
| 12 | West Melbourne (Industrial) | 422335 |
| 13 | West Melbourne (Residential) | 530041 |
We can now see the total forecasted jobs for each area up until 2040.
Next, we are going to visualise the forecasted jobs for each area on a choropleth map. Creating a choropleth map requires us to know the geometry (shape) of each area as a collection of latitude and longitude points defining a polygon. This data can be downloaded from the Melbourne Open Data Portal in GeoJSON format. For our data, we are using the Small Areas for Census Land Use and Employment (CLUE) data as this aligns with the forecasted jobs dataframe we have. This dataset contains spatial definition of boundaries about the original 'geography' column of our dataset.
Below we extract the Small Areas for Census Land Use and Employment (CLUE) data.
area_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/small-areas-for-census-of-land-use-and-employment-clue/exports/geojson?limit=-1&offset=0&timezone=UTC'
r = requests.get(area_url)
response = r.json()
area = response
# Display the unique keys for each spacial boundary
area['features'][0]['properties'].keys()
dict_keys(['geo_point_2d', 'featurenam', 'shape_area', 'shape_len'])
We can see that the 'featurenam' field in the CLUE area data contains the specific area of the given spacial boundary. We are going to use this to match with our jobsByArea dataframe (remember we changed the 'geography' column to 'featurenam' so it aligns with the CLUE data).
Simply put, our original dataframe has labelled string areas (Carlton, Docklands, etc.), without spacial definitions (latitude/longitude points). To plot these areas, we require the spacial definitions, which is present in the CLUE dataset. We are essentially matching the labelled string areas between datasets as a key, allowing us to plot our original data as the CLUE data contains the required polygons (spacial definitions). The key as mentioned above is the 'featurenam', which holds the labelled string areas (Carlton, Docklands, etc.).
Now using the chloropleth_mapbox function we can display a map using the CLUE data (GeoJSON) to define the regions and the jobsByArea dataframe to define the summarised data by area.
# Display the choropleth map
fig = px.choropleth_mapbox(jobsByArea, # pass in the summarised jobs by area
geojson=area, # pass in the GeoJSON data defining the areas
locations='featurenam', # define the unique identifier for the areas from the dataframe
color='forecasted_jobs', # change the colour of the area according to the forecasted jobs
color_continuous_scale=["red", "orangered", "orange",
"yellow", "greenyellow", "green"], # define custom colour scale
range_color=(0, jobsByArea['forecasted_jobs'].max()), # set the numeric range for the colour scale
featureidkey="properties.featurenam", # define the Unique polygon identifier from the GeoJSON data
mapbox_style="carto-darkmatter", # set the visual style of the map
zoom=11.9, # set the zoom level
center = {"lat": -37.813, "lon": 144.945}, # set the map centre coordinates
opacity=0.3, # opacity of the choropleth polygons
hover_name='featurenam', # the title of the hover pop up box
hover_data={'featurenam':True,'forecasted_jobs':True}, # data in popup box
labels={'forecasted_jobs':'Forecasted Jobs','featurenam':'Area'}, # labels for pupup box
title='New Forecasted Jobs by 2040', # Title for plot
width=950, height=800 # dimensions of plot
)
fig.show()
We now have a visualisation of the forecasted jobs by 2040 in each area around Melbourne!
School location data is taken from Data Victoria. This is not CoM data, however, it is still useful and will help us in our analysis.
For analysis and visualisation of the school locations, we are going to map all schools within a specific area of the CBD so it is useful for our end visualisation - if we plot all school data, our map will have points located outside of Melbourne which isn't useful.
First, we are going to have a look at the data to see its characteristics.
sl_data.head(5)
| Education_Sector | Entity_Type | SCHOOL_NO | School_Name | School_Type | School_Status | Address_Line_1 | Address_Line_2 | Address_Town | Address_State | ... | Postal_Address_Line_1 | Postal_Address_Line_2 | Postal_Town | Postal_State | Postal_Postcode | Full_Phone_No | LGA_ID | LGA_Name | X | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Government | 1 | 1 | Alberton Primary School | Primary | O | 21 Thomson Street | NaN | Alberton | VIC | ... | 21 Thomson Street | NaN | ALBERTON | VIC | 3971 | 03 5183 2412 | 681 | Wellington (S) | 146.66660 | -38.61771 |
| 1 | Government | 1 | 3 | Allansford and District Primary School | Primary | O | Frank Street | NaN | Allansford | VIC | ... | Frank Street | NaN | ALLANSFORD | VIC | 3277 | 03 5565 1382 | 673 | Warrnambool (C) | 142.59039 | -38.38628 |
| 2 | Government | 1 | 4 | Avoca Primary School | Primary | O | 118 Barnett Street | NaN | Avoca | VIC | ... | P O Box 12 | NaN | AVOCA | VIC | 3467 | 03 5465 3176 | 599 | Pyrenees (S) | 143.47565 | -37.08450 |
| 3 | Government | 1 | 8 | Avenel Primary School | Primary | O | 40 Anderson Street | NaN | Avenel | VIC | ... | 40 Anderson Street | NaN | AVENEL | VIC | 3664 | 03 5796 2264 | 643 | Strathbogie (S) | 145.23472 | -36.90137 |
| 4 | Government | 1 | 12 | Warrandyte Primary School | Primary | O | 5-11 Forbes Street | NaN | Warrandyte | VIC | ... | 5-11 Forbes Street | NaN | WARRANDYTE | VIC | 3113 | 03 9844 3537 | 421 | Manningham (C) | 145.21398 | -37.74268 |
5 rows × 21 columns
As seen above, there are many columns that won't be useful for our analysis. Let's remove these by creating a new dataframe and storing only the important columns. Remember, we are only focused on plotting the locations of the schools, therefore, most of the data inside the dataset isn't useful.
*Note: The address of each school (Address_Line_1) may appear to be useful as we want to plot the locations, however, when using plotly and python to plot geodata, the latitude and longitude is best to be used.
# Keep only the required columns for plotting
sl_data = sl_data[['School_Name', 'School_Type', 'X', 'Y']]
# Remove null data
sl_data = sl_data.dropna()
# Reset the index of data frame
sl_data = sl_data.reset_index(drop=True)
sl_data.head(5)
| School_Name | School_Type | X | Y | |
|---|---|---|---|---|
| 0 | Alberton Primary School | Primary | 146.66660 | -38.61771 |
| 1 | Allansford and District Primary School | Primary | 142.59039 | -38.38628 |
| 2 | Avoca Primary School | Primary | 143.47565 | -37.08450 |
| 3 | Avenel Primary School | Primary | 145.23472 | -36.90137 |
| 4 | Warrandyte Primary School | Primary | 145.21398 | -37.74268 |
Great! Now we have our required data to plot each school and it's type around Melbourne. We are going to use plotly express to create a scatter mapbox. This will help us plot the school location using the latitude (X) and longitude (Y) as seen in the above dataframe.
However, before we do this, we need to remove some schools from our data. Currently, the data contains all schools in Victoria, meaning that when we create our map, it will show all of Victoria. We are only interested in Melbourne. Specifically, we want a perimeter of schools around the CoM CBD. To do this, we are going to create a radius from the Melbourne CBD (this is according to the size of our map). The relevant information can therefore be drawn that:
We are going to remove all schools that don't meet these conditions:
# Remove if longitude is greater than or less than
sl_data.drop(sl_data[sl_data['X'] < 144.88824].index, inplace = True)
sl_data.drop(sl_data[sl_data['X'] > 145.00226].index, inplace = True)
# Remove if latitude is greater than or less than
sl_data.drop(sl_data[sl_data['Y'] > -37.77682].index, inplace = True)
sl_data.drop(sl_data[sl_data['Y'] < -37.85019].index, inplace = True)
sl_data
| School_Name | School_Type | X | Y | |
|---|---|---|---|---|
| 29 | Flemington Primary School | Primary | 144.93392 | -37.78067 |
| 30 | Footscray Primary School | Primary | 144.89267 | -37.79838 |
| 51 | Fitzroy Primary School | Primary | 144.98151 | -37.79960 |
| 70 | South Yarra Primary School | Primary | 144.98562 | -37.84135 |
| 183 | Albert Park Primary School | Primary | 144.95277 | -37.84188 |
| ... | ... | ... | ... | ... |
| 2233 | St Joseph's Flexible Learning Centre Melbourne | Special | 144.95494 | -37.80384 |
| 2251 | Melbourne Indigenous Transition School | Special | 144.98926 | -37.81954 |
| 2262 | River Nile School | Special | 144.95509 | -37.80519 |
| 2264 | Hester Hornbrook Academy | Special | 144.95615 | -37.81654 |
| 2290 | Ignatius Learning Centre | Secondary | 144.99848 | -37.82191 |
77 rows × 4 columns
Great! Now we can see that there are only 77 schools left, which sounds about correct. Lets now plot it to see:
fig2 = px.scatter_mapbox(sl_data, lat='Y', lon='X', # plot on latitude and longitude
mapbox_style="carto-darkmatter", # style of map
zoom=12.15, # set initial zoom
center = {"lat": -37.813, "lon": 144.945}, # centre of the map
opacity=0.7, # opacity of each marker/dot
hover_name="School_Name", # only display the school name when hovered
hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
color = 'School_Type', # each school type has a different colour
color_discrete_sequence=['mediumorchid', 'blue', 'red', 'aqua',
'limegreen', 'orange'], # school colours
labels={'School_Name':'School Name', 'School_Type':'School Type'}, # change labels
title = 'Schools Locations 2022', # title of map
width=950, height=800) # size of map
fig2.update_traces(marker={'size': 10}) # change the marker size
fig2.show()
We can now see the school locations around Melbourne. Each school type corresponds to a colour, and we can see the school name when we hover our mouse over each point!
Next, we are going to look at the development activity monitor dataset. This dataset tracks new commercial and residential property development in the City of Melbourne. Due to access to housing being a key factor in liveability, we are going to specifically look at the residential property side of this dataset.
Let's print the first five rows of the dataset to see what it looks like:
dev_data.head(5)
| data_format | development_key | status | year_completed | clue_small_area | clue_block | street_address | property_id | property_id_2 | property_id_3 | ... | hospital_flr | recreation_flr | publicdispaly_flr | community_flr | car_spaces | bike_spaces | town_planning_application | longitude | latitude | geopoint | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Pre May 16 | X000568 | COMPLETED | 2012 | West Melbourne (Residential) | 411 | 1-13 Abbotsford Street WEST MELBOURNE VIC 3003 | 100001 | None | None | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 144.943280 | -37.807920 | {'lon': 144.9432805, 'lat': -37.80791988} |
| 1 | Pre May 16 | X000557 | COMPLETED | 2002 | West Melbourne (Residential) | 401 | 7-21 Anderson Street WEST MELBOURNE VIC 3003 | 100435 | None | None | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 144.941547 | -37.804777 | {'lon': 144.9415469, 'lat': -37.80477682} |
| 2 | Pre May 16 | X000448 | COMPLETED | 2015 | North Melbourne | 314 | 302-308 Arden Street NORTH MELBOURNE VIC 3051 | 100509 | None | None | ... | 0 | 0 | 0 | 0 | 24 | 6 | 0 | 144.937724 | -37.799250 | {'lon': 144.9377236, 'lat': -37.79925034} |
| 3 | Pre May 16 | X000458 | COMPLETED | 2004 | North Melbourne | 330 | 162-168 Arden Street NORTH MELBOURNE VIC 3051 | 100519 | None | None | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 144.946228 | -37.800320 | {'lon': 144.9462277, 'lat': -37.80032041} |
| 4 | Pre May 16 | X000996 | COMPLETED | 2013 | North Melbourne | 1012 | 201 Arden Street NORTH MELBOURNE VIC 3051 | 100552 | None | None | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 144.941047 | -37.800299 | {'lon': 144.9410467, 'lat': -37.80029861} |
5 rows × 42 columns
This dataset has many columns (42) as seen above. We only want to focus on the residential properties, therefore, we should remove all property information except the 'resi_dwellings'. Keep in mind we still want to keep the CLUE Small Area and CLUE Block. This is because we are going to visualise the amount of residential dwellings by each block (similar to what we did in the Job Forecast analysis).
# Keep only the required columns to perform visualisation
dev_data = dev_data[['clue_small_area', 'clue_block', 'resi_dwellings']]
dev_data.head(5)
| clue_small_area | clue_block | resi_dwellings | |
|---|---|---|---|
| 0 | West Melbourne (Residential) | 411 | 10 |
| 1 | West Melbourne (Residential) | 401 | 31 |
| 2 | North Melbourne | 314 | 0 |
| 3 | North Melbourne | 330 | 16 |
| 4 | North Melbourne | 1012 | 0 |
Awesome! The dataset now only contains the information we need to create a choropleth map. We have the CLUE Small Area, the CLUE Block ID, as well as the amount of residential dwellings corresponding to the given block/area.
However, before we can create the map, we need to perform grouping and aggregation so we can gather dwelling information by each block area:
# Cast datatypes
dev_data['resi_dwellings'] = dev_data['resi_dwellings'].astype(int)
dev_data = dev_data.convert_dtypes()
# Group by fields
groupby = ['clue_block', 'clue_small_area']
# Aggregate by fields
aggregateby = {'resi_dwellings': ['sum']}
# Perform grouping and aggregation
dwellingsByBlock = pd.DataFrame(dev_data.groupby(groupby, as_index=False).agg(aggregateby))
dwellingsByBlock.columns = dwellingsByBlock.columns.map(''.join) # flatten column header
dwellingsByBlock.rename(columns={'clue_small_area': 'clue_area'}, inplace=True) # rename to match GeoJSON extract
dwellingsByBlock.rename(columns={'resi_dwellingssum': 'dwelling_count'}, inplace=True)
dwellingsByBlock.head(5)
| clue_block | clue_area | dwelling_count | |
|---|---|---|---|
| 0 | 1 | Melbourne (CBD) | 385 |
| 1 | 2 | Melbourne (CBD) | 0 |
| 2 | 6 | Melbourne (CBD) | 0 |
| 3 | 11 | Melbourne (CBD) | 706 |
| 4 | 12 | Melbourne (CBD) | 33 |
Similar to the job forecast, we need the CLUE area information so we can generate the choropleth map. We want to plot the dwellings by each block (CLUE Block) this time though, not by the CLUE Area.
We therefore need to import the geometry (shape) of each CLUE Block as collections of latitude and logitude points. This data can be gathered from the Melbourne Open Data Portal in GeoJSON format. Below we will extract the required CLUE Block data.
block_url = 'https://data.melbourne.vic.gov.au/api/v2/catalog/datasets/blocks-for-census-of-land-use-and-employment-clue/exports/geojson?limit=-1&offset=0&timezone=UTC'
r = requests.get(block_url)
response = r.json()
block = response
We can see that the 'block_id' field in the CLUE Block data contains the specific block of the given spacial boundary. We are going to use this to match with our dwellingsByBlock dataframe (the clue_block field).
Note: The 'block_id' field in the CLUE Block Data is the equivalent of the 'clue_block' field in the dwellingsByBlock data. We can map the dwelling count by each block thanks to these two fields matching between the datasets and acting as a key. We are basically plotting the dwellingsByBlock data by pairing each block (clue_block) with the CLUE block_id, where the CLUE Block dataset contains the actual geodata we need to map the blocks with their dwellings.
Now using the chloropleth_mapbox function we can display a map using the Block GeoJSON data to define the regions and the dwellingsByBlock dataframe to define the summarised data by block.
# Display the choropleth map
fig4 = px.choropleth_mapbox(dwellingsByBlock, # pass in the dwellings data
geojson=block, # pass in the block data
locations='clue_block', # locations of dwellings are clue_block (block_id)
color='dwelling_count', # colour correspnding to dwelling count of each block
color_continuous_scale=["red", "orangered", "orange",
"yellow", "greenyellow", "green"], # colour scale
range_color=(0, dwellingsByBlock['dwelling_count'].max()), # range of colour scale
featureidkey="properties.block_id", # match the block_id to clue_block
mapbox_style="carto-darkmatter", # style of map
zoom=12.15, # initial zoom
center = {"lat": -37.813, "lon": 144.945}, # centre of map
opacity=0.22, # opacity of highlighted blocks
hover_name='clue_area', # area displayed when hovered over
hover_data={'clue_block':True,'dwelling_count':True}, # data displayed when hovered over
labels={'dwelling_count':'Residential Dwellings', 'clue_block':'CLUE Block ID'}, # label changes
title='Completed and Under-Construction Residential Dwellings 2022', # title of map
width=950, height=800 # size of map
)
fig4.show()
You've now successfully plotted the amount of residential dwellings by block in the City of Melbourne!
One key component of liveability is the amount of support services such as hospitals that are near a given location. Therefore, it is important to include this in our visualisation. This dataset may require updates, but given free and cheap support services rarely shifting, this data is satisfactory.
Let's take a look at the data:
ss_data.head(5)
| name | what | who | address_1 | address_2 | suburb | phone | phone_2 | free_call | ... | nearest_train_station | category_1 | category_2 | category_3 | category_4 | category_5 | category_6 | longitude | latitude | geocoded_location | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Child Protection Emergency Service | None | None | None | None | None | 13 12 78 | None | None | None | ... | None | Helpful phone number | N/A | N/A | N/A | N/A | N/A | NaN | NaN | None |
| 1 | Gamblers Help Line Victoria | None | None | None | None | None | None | None | 1800 858 858 | None | ... | None | Helpful phone number | N/A | N/A | N/A | N/A | N/A | NaN | NaN | None |
| 2 | Kids Help line | None | None | None | None | None | None | None | 1800 551 800 | None | ... | None | Helpful phone number | N/A | N/A | N/A | N/A | N/A | NaN | NaN | None |
| 3 | Lifeline (24 hour crisis counselling) | None | None | None | None | None | 13 11 14 | None | None | None | ... | None | Helpful phone number | N/A | N/A | N/A | N/A | N/A | NaN | NaN | None |
| 4 | Narcotics Anonymous - Victorian Area Helpline | None | None | None | None | None | 9525 2833 | None | None | info@navic.net.au | ... | None | Helpful phone number | Helpful website | N/A | N/A | N/A | N/A | NaN | NaN | None |
5 rows × 34 columns
Awesome! We can now see the structure of the data. We are focused on creating a visual of these locations, therefore, we want to only keep the name, service, and latitude/longitude of the data:
# Keep specific data
ss_data = ss_data[['name', 'what', 'latitude', 'longitude']]
# Print
ss_data.head(5)
| name | what | latitude | longitude | |
|---|---|---|---|---|
| 0 | Child Protection Emergency Service | None | NaN | NaN |
| 1 | Gamblers Help Line Victoria | None | NaN | NaN |
| 2 | Kids Help line | None | NaN | NaN |
| 3 | Lifeline (24 hour crisis counselling) | None | NaN | NaN |
| 4 | Narcotics Anonymous - Victorian Area Helpline | None | NaN | NaN |
The data now contains only useful information for plotting each of the locations.
However, we can see from above that some of our data doesn't contain latitude and longitude. This is because these services are online/phone based, for example, Child Protection Emergency Services. We are only focused on brick and mortar style health services. Let's remove any data that doesn't contain latitude and longitude:
# Remove null data
ss_data = ss_data.dropna()
# Reset the index of data frame
ss_data = ss_data.reset_index(drop=True)
ss_data.head(5)
| name | what | latitude | longitude | |
|---|---|---|---|---|
| 0 | Aboriginal Family Violence Prevention and Lega... | Legal Services, Counselling Support, Informati... | -37.806427 | 144.986299 |
| 1 | Alcoholics Anonymous Victoria | AA is a fellowship of men and women who share ... | -37.811648 | 145.000307 |
| 2 | Royal Melbourne Hospital | Outpatients’ emergency service | -37.798877 | 144.956177 |
| 3 | Anglicare Victoria – St.Mark’s Community Centre | St Mark’s provides assistance to homeless peop... | -37.801611 | 144.981835 |
| 4 | Brotherhood of St Laurence Coolibah Centre | Breakfast $1.00 \nlunch $3, afternoon tea $0.2... | -37.805286 | 144.977265 |
Awesome! We now have the data in the required format we want so we can plot the locations of the services. Let's now again remove the data that isn't focused within a radius of the CBD. We are going to apply the same radius as the school locations:
# Get data into required type
ss_data['longitude'] = ss_data['longitude'].astype(float)
ss_data['latitude'] = ss_data['latitude'].astype(float)
# Remove if longitude is greater than or less than
ss_data.drop(ss_data[ss_data['longitude'] < 144.88824].index, inplace = True)
ss_data.drop(ss_data[ss_data['longitude'] > 145.00226].index, inplace = True)
# Remove if latitude is greater than or less than
ss_data.drop(ss_data[ss_data['latitude'] > -37.77682].index, inplace = True)
ss_data.drop(ss_data[ss_data['latitude'] < -37.85019].index, inplace = True)
We can now plot the data into a visualisation using a scatter mapbox:
fig5 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude', # plot on latitude and longitude
mapbox_style="carto-darkmatter", # style of map
zoom=12.15, # set initial zoom
center = {"lat": -37.813, "lon": 144.945}, # centre of the map
opacity=0.7, # opacity of each marker/dot
hover_name="name", # only display the school name when hovered
hover_data={"name":False,"what":True,"latitude":False,"longitude":False}, # data displayed
labels={'what':'Service'}, # change labels
title = 'Free and Cheap Services Locations 2020', # title of map
width=950, height=800) # size of map
fig5.update_traces(marker={'size': 10}) # change the marker size
fig5.show()
Awesome! We now have the locations plotted all around Melbourne of free and cheap support services!
We have now plotted and analysed all of our required datasets, so we are going to combine them all into one visualisation that we can interact with and see the different areas of liveability.
To do so, we are going to create a base layer and then plot each data on top of this base layer. First, let's create the base layer and set the default styles and concepts of the map:
# Create the base figure to which layers(traces) will be added.
output_fig = go.Figure()
# Set the default style for the map
output_fig.update_layout(mapbox_style="carto-darkmatter")
output_fig.update_layout(hovermode='closest')
output_fig.update_layout(mapbox_center_lat=-37.813, mapbox_center_lon=144.945, mapbox_zoom=12.15)
output_fig.update_layout(width=975, height=800)
output_fig.update_layout(title='Liveability Visualisation Analysis')
output_fig.update_layout(coloraxis_colorscale='viridis')
# Stores inside new output_fig to prevent printing to screen
output_fig = output_fig.update_layout(coloraxis_colorbar={'title':'Jobs/Dwellings'})
Now that the base map has been set, we can add each of the dataset visualisations to the base layer. The first layers we will add are the single locational plots, i.e., the School Locations, and the Health/Services Locations. To add these layers, we are first going to create new maps for them with different characteristics to above so we can combine them in a single source visualisation.
However, we are first going to add a new column to the locational data with constant values of Schools and Services. We are doing this so we can create a legend on our final visualisation that shows which marker point is what.
sl_data['Plot_Type'] = 'Schools'
ss_data['Plot_Type'] = 'Health/Services'
sl_data.head(5)
| School_Name | School_Type | X | Y | Plot_Type | |
|---|---|---|---|---|---|
| 29 | Flemington Primary School | Primary | 144.93392 | -37.78067 | Schools |
| 30 | Footscray Primary School | Primary | 144.89267 | -37.79838 | Schools |
| 51 | Fitzroy Primary School | Primary | 144.98151 | -37.79960 | Schools |
| 70 | South Yarra Primary School | Primary | 144.98562 | -37.84135 | Schools |
| 183 | Albert Park Primary School | Primary | 144.95277 | -37.84188 | Schools |
We can now see that there is a new column called Plot_Type that will depict if the marker is a school or a service. Let's now create the traces for each marker and add it to our base plot.
The code below also plots both sets of data twice as seen in visfig1 and visfig2. We are simply creating a smaller marker plot with more opacity so it gives the final marker a better visual appeal. See below output markers compared to above visualisations to compare.
# School location data
fig1 = px.scatter_mapbox(sl_data, lat="Y", lon="X",
hover_name="School_Name",
hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
labels={'School_Name':'School Name', 'School_Type':'School Type'},
opacity=0.5,
color_discrete_sequence=['blue'],
color='Plot_Type')
# Change marker size for school location points
fig1.update_traces(marker={'size':12})
# Service location data
fig2 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude',
hover_name="name",
hover_data={"name":False,"what":False,"latitude":False,"longitude":False},
opacity=0.5,
color_discrete_sequence=['red'],
color='Plot_Type')
# Change marker size for service location points
fig2.update_traces(marker={'size':12})
# Add same location points but with smaller marker
# This gives nicer marker visualisation look
# Only for visual appeal
####
visfig1 = px.scatter_mapbox(ss_data, lat='latitude', lon='longitude',
hover_name="name",
hover_data={"name":False,"what":False,"latitude":False,"longitude":False},
opacity=1,
color_discrete_sequence=['red'],)
visfig1.update_traces(marker={'size':5})
visfig2 = px.scatter_mapbox(sl_data, lat="Y", lon="X",
hover_name="School_Name",
hover_data={"School_Name":False,"School_Type":False,"X":False,"Y":False},
labels={'School_Name':'School Name', 'School_Type':'School Type'},
opacity=1,
color_discrete_sequence=['blue'],)
visfig2.update_traces(marker={'size':5})
####
# Add both locational maps/traces to the base layer
output_fig.add_trace(fig1.data[0])
output_fig.add_trace(fig2.data[0])
output_fig.add_trace(visfig1.data[0])
output_fig.add_trace(visfig2.data[0])
Great! We now have a visually appealing map of the school locations and health/services around Melbourne. We want to also plot the other liveability characteristics including residential dwellings as well as job forecasts. To do so, we will use a similar approach to the above method, but change the scatter mapbox to choropleth mapbox, and then add the trace to our current base map.
# Create the forecasted jobs plot
fig3 = px.choropleth_mapbox(jobsByArea, geojson=area, locations='featurenam', color='forecasted_jobs',
range_color=(0, jobsByArea['forecasted_jobs'].max()),
featureidkey="properties.featurenam",
hover_name='featurenam',
hover_data={'featurenam':True,'forecasted_jobs':True},
labels={'forecasted_jobs':'Forecasted Jobs','featurenam':'Area'},
opacity=0.3,
)
# Add forecasted jobs layer to the base figure
output_fig.add_trace(fig3.data[0])
# Create the residential dwellings plot
fig4 = px.choropleth_mapbox(dwellingsByBlock, geojson=block, locations='clue_block', color='dwelling_count',
range_color=(0, dwellingsByBlock['dwelling_count'].max()),
featureidkey="properties.block_id",
hover_name='clue_area',
hover_data={'clue_block':True,'dwelling_count':True},
labels={'dwelling_count':'Residential Dwellings', 'clue_block':'CLUE Block ID'},
opacity=0.3
)
# Add residential dwellings layer to the base figure
# Store final plot in variable to prevent printing to screen
output_fig = output_fig.add_trace(fig4.data[0])
Let's also add a tool that allows us to input our own address and get it plotted on the map. To do this, we are going to use Nominatim, which is a geocoding software that powers open street map. Through this tool we are able to input an address, in which a latitude and longitude corresponding to that address is returned.
We will input our own address, and then pass it into a 'url' variable. Once done, we will use the requests package to get the JSON response from Nominatim, specifically, the latitude and longitude for the given address:
# Input your own address below
# Make sure it is copied exactly as Nominatim has it
address = 'Macarthur Road, Parkville, Melbourne, City of Melbourne, Victoria, 3052, Australia'
# Store in url
url = 'https://nominatim.openstreetmap.org/search/' + urllib.parse.quote(address) +'?format=json'
# Get response
response = requests.get(url).json()
# X Address (Longitude) and Y Address (Latitude)
X_adr = response[0]["lon"]
Y_adr = response[0]["lat"]
# Print to screen
print(f"The latitude of your custom address is: {Y_adr}")
print(f"The longitude of your custom address is: {X_adr}")
The latitude of your custom address is: -37.7892998 The longitude of your custom address is: 144.9570502
Great! Now we have the latitude and longitude of our address. Feel free to add your own address, but make sure it is inputted exactly as the Nominatim resource states it as.
Let's turn our lat/long pair into a data frame so we can add it to our plot.
# Create the data with corresponding labels
data = {'address': [address], # string address
'X': [X_adr], # longitude
'Y': [Y_adr], # latitude
'Plot_Type':'Custom Location'} # plot_type for legend on final visualisation
# Create dataframe of data
adr_df = pd.DataFrame(data)
# Show dataframe
adr_df
| address | X | Y | Plot_Type | |
|---|---|---|---|---|
| 0 | Macarthur Road, Parkville, Melbourne, City of ... | 144.9570502 | -37.7892998 | Custom Location |
Awesome! The personal address is now inside a dataframe with X and Y coordinates, as well as the actual address and plot type (for visualisation). Our next step is to add this to our visualisation above. To do so, we are going to use the previous approach of a scatter mapbox.
Note: This custom address tool will be useful as we can see the exact location and analyse surrounding schools, dwellings, and suburbs with respect to population!
# We are going to make multiple traces to make the point more visually appealling like previously
# First marker point
adr_fig = px.scatter_mapbox(adr_df, lat='Y', lon='X',
hover_name="address",
hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False},
opacity=0.4,
color_discrete_sequence=['green'],)
adr_fig.update_traces(marker={'size': 30})
# Second marker point
adr_fig2 = px.scatter_mapbox(adr_df, lat='Y', lon='X',
hover_name="address",
hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False},
opacity=0.7,
color_discrete_sequence=['limegreen'],)
adr_fig2.update_traces(marker={'size': 20})
# Third marker point
adr_fig3 = px.scatter_mapbox(adr_df, lat='Y', lon='X',
hover_name="address",
hover_data={"address":False,"X":False,"Y":False, "Plot_Type":False},
opacity=0.9,
color_discrete_sequence=['lime'],
color='Plot_Type')
adr_fig3.update_traces(marker={'size': 12})
# Add trace combination of points into the final visualisation
output_fig.add_trace(adr_fig.data[0])
output_fig.add_trace(adr_fig2.data[0])
output_fig = output_fig.add_trace(adr_fig3.data[0])
Now we have the final visualisation stored inside output_fig, but we need to add some usability to it such as a drop down menu to change the data displayed on the map.
To do this, we first must turn off all layers of the current map as we want a base start for the drop down menu. We then define the buttons for this drop down menu.
Each button will turn on requested maps/layers when clicked. This is decided through 'visible' booleans in args which correspond to True or False of displaying the desired map.
For example: If School Locations and Health/Service Locations are clicked, we require the first 4 figures above to be displayed (remember 2 were for visualisation purposes but we still want them displayed). This corresponds to a relevant visiable boolean sequence of [True, True, True, True, False, False, True].
Note: The final True boolean corresponds to our custom location
# Turn off all choropleth layers
output_fig.update_traces(visible=False, selector=dict(type='choroplethmapbox'))
# Add buttons for selection on the visualisations
buttons = [dict(method='update',
label='School Locations || Health/Services Locations', visible=True,
args=[{'label': 'Venue Seating', 'visible':[True, True, True, True, False, False, True]}]),
dict(method='update',
label='Forecasted Jobs || Schools || Health/Services', visible=True,
args=[{'label': 'Residential Dwelling Density','visible':[True, True, True, True, True, False, True]}]),
dict(method='update',
label='Residential Dwellings || Schools || Health/Services', visible=True,
args=[{'label': 'Employment Density','visible':[True, True, True, True, False, True, True]}])
]
um_buttons = [{'active':0, 'showactive':True, 'buttons':buttons,
'direction': 'down', 'xanchor': 'left','yanchor': 'bottom', 'x': 0.53, 'y': 1.06}]
map_annotations = [{'text':'Please select a map view to display', 'x': 1, 'y': 1.15,
'showarrow': False, 'font':{'family':'Arial','size':14}}]
# Add features to the visualisation
output_fig.update_layout(updatemenus=um_buttons, annotations=map_annotations)
# Change position of legen
output_fig.update_layout(legend=dict(x=1,y=1.15))
# Display the visualisation
output_fig.show()
Our analysis and visualisations have provided a deeper understanding into the liveability components of Melbourne. Rather than being able to search up the best areas, we can now see visually each suburb and its liveability factors.
We achieved in this analysis:
We learned from this analysis:
Further opportunities:
Conclusion: Given the best and most access to residential dwellings (housing), as well as close proximity to the Melbourne CBD where the most jobs are forecasted, Docklands proves to be the best location of Melbourne. Furthermore, there are various schools, both primary and high situated in close proximity to Docklands, therefore suggesting it would be a high scoring suburb to live in.
Awesome work. The interactive map is now complete, and we can input custom locations to see relevant liveability metrics including schools, health services, forecasted jobs, and residential property that are around Melbourne. Thank you for your time!